-
Couldn't load subscription status.
- Fork 25
[MongoDB Storage] Handle "topology is closed" #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: ed313da The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds handling for unrecoverable MongoDB "topology is closed" errors by explicitly connecting on startup and propagating fatal errors to gracefully exit the process with custom exit codes.
- Introduces a new error code (PSYNC_S2402) for MongoDB connection failures.
- Registers fatal error listeners in the storage engine and storage provider.
- Updates process exit codes and corresponding log messages in CLI entry and lifecycle management.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/service-errors/src/codes.ts | Added new error code PSYNC_S2402 for MongoDB connection failures. |
| packages/service-core/src/system/ServiceContext.ts | Registered a listener to propagate storage fatal errors. |
| packages/service-core/src/storage/StorageProvider.ts | Extended the ActiveStorage interface to include onFatalError callback. |
| packages/service-core/src/storage/StorageEngine.ts | Hooked up fatal error notification from the active storage. |
| packages/service-core/src/entry/cli-entry.ts | Changed error log and exit code to indicate fatal startup errors. |
| modules/module-postgres/src/replication/WalStream.ts | Updated logger usage to use the instance logger. |
| modules/module-mongodb-storage/src/storage/implementation/MongoStorageProvider.ts | Explicitly connects to MongoDB on startup and handles 'topologyClosed' errors. |
| libs/lib-services/src/system/LifeCycledSystem.ts | Added stopWithError method with a distinct exit code for fatal errors. |
| .changeset/yellow-icons-cross.md | Updated the changeset for affected packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
Handle "Topology is closed" errors by exiting the process, in two ways:
client.connect()on startup, allowing us to catch the error.topologyClosederrors after startup, in case one does get through.These errors are not recoverable - it does not get resolved by retrying. It typically only happens on start-up - if the connection is lost at a later point, we get a different error that is automatically retried. By exiting the process, the container management system such as Kubernetes or Docker Compose can handle restarting the process using an appropriate back-off.
Some other options to consider:
This primarily impacts the API process - the replication process already had similar behavior on connection errors during startup.
The does not affect connections to the source database - only to the storage database. Source database errors in replication are handled in the replication loop. Source database errors in the API process is much less significant - only affects write checkpoints. We can investigate better ways to handle that later, but I haven't seen any reports of that being an issue yet.